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Abstract 

Design and operation of a utility scale photovoltaic (PV) power plant depends 
on accurate modeling of the power generated, which is highly correlated with 
aggregate solar irradiance on the plant’s PV modules. At present, aggregate 
solar irradiance over the area of a typical PV power plant cannot be mea¬ 
sured directly. Rather, irradiance measurements are typically available from 
a few, relatively small sensors and thus aggregate solar irradiance must be 
estimated from these data. As a step towards hireling more accurate methods 
for estimating aggregate irradiance from avaialble measurements, we evalu¬ 
ate semiparametric spatio-temporal models for global horizontal irradiance. 
Using data from a 1.2 MW PV plant located in Lanai, Hawaii, we show 
that a semiparametric model can be more accurate than simple intepolation 
between sensor locations. We investigate spatio-temporal models with sepa¬ 
rable and nonseparable covariance structures and find no evidence to support 
assuming a separable covariance structure. 

Keywords: Irradiance, Models, Spatio-temporal model, Nonseparability, 
Lattice data, Semiparametric time series 
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1. Introduction 


Accurate modeling of power output from utility scale photovolaic (PV) 
power plants is often key to obtaining favorable financial terms during sys¬ 
tem design and construction, and to efficient and profitable plant operation. 
Accurate modeling of power output requires estimating aggregate plane-of- 
array (POA) irradiance over the plant’s footprint with sufficient precision, 
because aggregate POA irradiance is highly correlated with power output 


(Kuszamaul et al. 2010) 


There is great interest in methods to improve the accuracy of estimates of 
aggregate irradiance used for PV power plant modeling. Error in estimating 
the aggregate irradiance translates directly to error in modeled power output 
and thus to error in projected energy production; relatively small errors in 
projected energy may translate to significant uncertainty in projected profit 
because utility-scale PV plants are typically leveraged financially. 

Here, we explore statistical modeling of global horizontal irradiance (GHI) 
at spatial and temporal scales relevant to design and operation of a utility- 
scale PV power plant, i.e., on the order of 1 km 2 and a few minutes. We 
apply recent advances in spatio-temporal statistical methods and illustrate 
our results with data from a 1.2MW PV plant at La Ola, Lanai, HI. We 
pursue semiparametric (i.e., data-driven) rather than parametric approaches 
because a successful model could then be applied regardless of weather con¬ 
ditions at the location of interest. In contrast, parametric models implicitly 
assume that random variables in the model (e.g., GHI) are well-described by 
specified distributions, an assumption which may not hold if weather con¬ 
ditions change. We compare the resulting models with the commonly used 
simple spatial average which estimates aggregate irradiance over a plant’s 
footprint by averaging measurements from sensors located in or near the 
plant. 

A challenge in modeling irradiance data is incorporating the interaction 
between time and space. Especially in the presence of advecting clouds, the 
irradiance observed at one location is likely to also be observed at other 
locations but with a time shift. Thus we anticipate that irradiance will ex¬ 
hibit a spatial autocorrelation that varies with time. Spatio-temporal models 
explicitly account for this autocorrelation and thus may predict aggregate ir¬ 
radiance more accurately than does a simple spatial average. 

Fundamentally, GHI can be viewed as a random spatio-temporal process. 
Literature reports several efforts at modeling individual time-series of irradi- 
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ance, and substantially fewer attempts to construct spatio-temporal models 
of irradiance considering several proximal locations. The literature on in¬ 
dividual time-series modeling includes approaches based on autoregressive 
integrated moving average (ARIMA) analysis (e.g., Yang et al. 2012), non¬ 


linear autogressive analysis ( Glasbey} |2001 ), regression analysis (Reikard 


2009), artificial neural networks analysis (Paoli et ah, 2010), fc-Nearest Neigh¬ 


bors algorithm (Paoli et ah, 2010) and Bayesian inference (Paoli et ah, 2010). 
These approaches focus on forecasting irradiance considering only the mea¬ 
surements at a selected location separately from measurements at other loca¬ 
tions. Paoli et ah (2010) considers a type of artificial neural network known 


as Multi-Layer Perceptron (MLP) network and finds their method performs 
as well or better than other methods such as ARIMA analysis, Bayesian in¬ 
ference, and A;-Nearest Neighbors. Yang et ah (2012) introduce an ARIMA 


model that incorporates low-resolution, ground-based cloud cover data to ob¬ 
tain next hour solar irradiance. The authors state that their ARIMA model 
outperforms all other time series forecasting methods in four of the six sta¬ 
tions they tested. In both the MLP and ARIMA methods, the model does 
not incorporate a spatial component but only models irradiance in time. 

The literature on spatio-temporal modeling of solar irradiance is limited. 
To our knowledge, ours is the first attempt to model irradiance at time 
and spatial scales relevant to modeling a utility-scale PV plant. Glasbey 


and Allcroft (2008) model irradiance data from ten sensors roughly at 5km 


spacing using a spatio-temporal autoregressive moving average (STARMA) 
model. The STARMA model incorporates the Euclidian distances between 
two points in order to model the spatial structure of the data. However, the 
STARMA model used in Glasbey and Allcroft (2008) assumes a separable 
covariance structure, an assumption which we find to be questionable at the 
scale of a single PV plant. 

We propose a spatio-temporal model that incorporates a data-driven 
method for modeling the time series component. Our model improves upon 
the works of Yang et al. (2012) and Glasbey (2001) because we do not as¬ 
sume a parametric form for the time component of the model, and improves 
on Glasbey and Allcroft (2008) through the nonseparable covariance struc¬ 
ture. The remainder of this paper is organized as follows. In Section [2j we 
discuss how the time series structure is modeled via a semiparametric model 
fitted with a data-driven method known as spline-backfitted kernel (SBK) 
estimation. In Section |2.3[ we introduce the spatio-temporal model, and 


compare the model’s performance assuming either a separable or a nonsepa- 
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rable covariance structure to evaluate whether separability can be assumed. 
In Section [3j we apply the model to irradiance data from the La Ola photo¬ 
voltaic plant in Lanai, HI. Finally, we provide discussion and conclusions in 
Section [0 


2. Modeling Irradiance 

Let Q s j. represent an observable process, e.g., measured GHI, at time t 
and location s for t — 1, 2,..., T and s = 1, 2, ..., S. If there is no interac¬ 
tion in time and space, the covariance function of Q S: t can be written as a 
product of two functions where one function is dependent on time only, and 
the other on location alone. Such a covariance function is called “separable.” 
However, when interactions in space and time are present, the covariance 
function is “nonseparable;” i.e., it cannot be factored into two separate func¬ 
tions. Spatio-temporal models with separable covariance are much easier to 
implement. But in the presence of space-time interaction, separable models 
do not perform well, and can lead to misleading or incorrect conclusions. 

For modeling Q st , consider 

Qs,t — R s ,t + Z s t i t — 1,..., T, s = 1,..., S, (1) 


where, at time t and location s, R S)t represents the true irradiance signal and 
Z s j is a noise process. Furthermore, decompose the noise process into a sum 
of three terms, 

Z s ,t = Xs,t + W,t + £s,ti (2) 


where X s j is a time series process at location s, Y st is a spatial process at 
time t, and e Sjt is a multivariate error process with mean zero and TS x TS 
covariance matrix X(s,f). If the process is separable, then the covariance 
matrix can be written as S(s,t) = A (t) ® T(s), where A (t) is a T x T 
temporal covariance matrix, T(s) is an S x S spatial covariance matrix, and 
<8) is the Kronecker product (Woolrich et al. 2004). 

There are a variety of methods for fitting separable spatio-temporal mod¬ 
els to space-time data. A review of space-time analysis methods and their 


computational counterparts can be found in Harvill (2010) or Cressie and 
Wiklc (2011). We consider three approaches to fitting model (J2]) . The first 


approach fits a spatial model at each time. Then spatial residuals are com¬ 
puted, and for each location a time series model is fitted to the spatial resid¬ 
uals at each location. The second approach models the time series at each 
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location, computes time residuals, and then fits a spatial model at each time 
to the residuals. Both of these approaches carry the assumption that the 
covariance structure is separable in space and time. The third approach re¬ 
moves the separability assumption, jointly modeling time and space using 


the spatio-temporal model introduced in Section 2.4 


2.1. Modeling the Time Series Component 

For modeling time series data arising from a dynamic process, such as 


solar irradiance, nonlinear models often out-perform linear models (Tong 


1993). Although the class of nonlinear time series models is infinitely large, 


there are many popular parametric nonlinear models including the bilinear 


(Haggan and Ozaki, 1981), and a variety of threshold autoregressive models 


model (Subba Rao and Gabr, 1984), the exponential autoregressive model 


(Tong, 1983 van Dijk, 1999). When one of these parametric models is known 


to be appropriate for analyzing the time series, it should be used for analyzing 
the series. However in the analysis of solar irradiance, no specific class of 
parametric nonlinear model has been shown to be generally applicable, and 
therefore we pursue a semiparametric approach. In this section, we examine 
only the time component of the model. So to ease notation, for the remainder 
of this section, we consider the location fixed, and suppress the s subscript; 
that is X st = X t for a fixed value of s. 

A highly versatile semiparametric model is the functional coefficient au¬ 


toregressive model of order p (FCAR(p)), first introduced by Chen and Tsay 


(1993). The FCAR(p) model has an additive autoregressive structure, but 


with coefficients that vary as a function of some variable, u say, which can 
be exogeneous to the series X t . In the pure time series context, u is a lagged 
value of the series, and we write u t = X t -d- In this paper, we restrict the 
FCAR(p) models to those with u t = X t -d, and so define the FCAR(p) model 
as 


X t = m 0 (u t ) + ^ 7nj(ut)X t _j + u t , t — p+l,...,T (3) 

3 = 1 

where u t = X t _ d , d < p, rrij(-), j = 0,1,2, ...,p are measurable func¬ 
tions of u, and {u; t } is a sequence of independent and identically distributed 
(IID) random variables with mean zero and constant variance. 

Reasonble use of the FCAR(p) model requires only that the model is 
additive, and places few restrictions on the functional coefficients. To il¬ 
lustrate the versality of the FCAR(p) model, note that if m 0 (u t ) = 0, and 
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rrij(u t ) = aj, j = 1,2,... ,p are constants, then the FCAR(p) reduces to a 
linear autoregressive model of order p, X t = a\X t -i + • • • + a p X t - p + oj t . 
Another example is, for each j = 1 ,,p, the coefficients are of the form 
rrij(X t _ d ) = aj + /3j exp {— <LA t 2 _ d }. Then the FCAR(p) model reduces to 
the exponential autoregressive model of Haggan and Ozaki ( jl981 ). More¬ 
over, the FCAR(p) formulation allows for a mixture of models; for example, 
mi(Xt-d) = oli and m 2 (A ( _d) = «2 + ^2 exp {— <LY t 2 _ d }. Fan and Yao (2003) 


contains a review of methods for fitting the FCAR(p) model, and related 
inferential procedures. In the following section, we propose a more recent, 
improved method for fitting the FCAR(p) model. 

2.2. Spline-Backfitting Kernel Estimation 

With no presupposed form for the functional coefficients, we propose a 
data-driven method for finding pointwise estimates of the functions rrij(u), 
j — 0,1,2,... ,p. A number of methods are proposed in the statistics lit¬ 
erature. Chen and Liu (2001) and Cai et al. (2000) propose a kernel re¬ 


gression approach to fitting the model. Harvill and Ray (2006) extend the 


procedure to the case when the series is a vector process. More recently, 
spline-backfitted kernel (SBK) estimation has been proposed as a means for 
fitting semiparametric models like the FCAR(p) model. SBK estimation is an 


adaptation of the backfitting algorithm of Hastie and Tibshirani (1990), and 


combines the computational speed of splines with the asymptotic properties 
of kernel smoothing. 

The SBK method uses an under-smoothed centered standard spline pro¬ 
cedure to pre-estimate the rrij{u), j = 0,1,2 ,...,p. These pre-estimates, 
also called “oracle” estimates, are used to find psuedo-responses. Then the 
pseudo-responses are used to estimate the mj (■ u ) through a kernel estimator; 
e.g., the Nadaraya-Watson estimator. The SBK method was first proposed 


by Wang and Yang (2007) for estimating nonlinear additive autoregressive 


models. Wang and Yang (2009) adapt the SBK method for IID data, Liu 


et al. (2011) adapt it to generalized additive models, and Ma and Yang (2011) 


to partially linear additive models. Liu and Yang (2010) propose the SBK 


method for additive coefficient models. 

The ability to estimate rrij{u), j — 0,1,2,... ,p relies on the good approx¬ 
imation properties of spline estimators. For any j = 0,1, 2,... ,p, assume 
nij(-) is sufficiently smooth. Without loss of generality, u can be defined 
on the compact interval [0,1]. Define the integer N ~ T 2// ' 5 logT, and let 
H = (N + 1) -1 . Let 0 = £0 < £1 < • • • < 6v < £n +1 = 1 denote a sequence of 
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equally spaced knots. There is a set of basis functions b 0 (u), b\{u),... b N+1 (u ) 
and a set of constants Aoy, A i ^,..., \n+ij such that the spline estimator of 
the j-th coefficient is 


N+l 


rrij [u) ~ rrij [u ) = 


^ ^ A k,jbk(u). 

k =o 

For the basis functions, we choose the linear 5-spline basis, defined by 

lu-fkl' 


bk(u) = 1 - 


H 


(N + l)u - k + 1, £ fe -i < u < 

= <( k + 1 - (iV + l)u, €k<u<£k+ 1, 
0, otherwise. 


( 4 ) 


The coefficients A 0 y, Aij,..., Ajv+iy are estimated via least squares; that is, 
the Afcj, k = 0,1,..., N + 1, j = 0,1, 2,... ,p are the values of A^j that 
minimize the sum of squares 


T 

E 

t=p+i L 


' JV+l 


“ Si S A fcJ 6 fc («) X t _,- 

j= i l fc=0 


( 5 ) 


The spline-estimated functional coefficients are then used to compute 
“pseudo-responses.” Specifically, for each j' = 0,1,2 ,...,p,j' ^ j, the 
pseudo-responses are defined by 


p 

W t ,j' = X t - ^2 fhj(u)X t -j, t = p + l,p + 2,... ,T. 


For each f = 0,1,2,... ,p, let W y = (W p+ , Wtj')' represent the 

vector of pseudo-responses, and define the matrix 


M = diag {K h (. X p+1 _ d - u ),..., K h (X T -d - u)} , 


where Kh(-) = h 1 K(-/h), K(-) is a kernel function, and h > 0 is a band¬ 
width. Then the SBK estimator of myiv) is 


rhj, 0) 



C'MCJ 


-l 


Ic'MW,,, 


( 6 ) 
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where 


C' = 


Ap_|_1 ^\ p _|_2 

Ap_|_i (-^-p+1 —d u) Ap_|_ 2 (^-p+2—d w) 


X T 

Xt {Xx-d ~ u) 


The idea behind SBK estimation is to under-smooth in the pre-estimates in 
order to reduce the bias. This under-smoothing leads to a larger variance 
which is reduced in the kernel estimation step. The use of splines for the 
pre-estimates is computationally fast while using kernel smoothing provides 


convenient asymptotic results (Liu and Yang, 2010). 


To illustrate, consider a series of T = 500 observations from the exponen¬ 
tial autoregressive model of order p = 2 (EXPAR(2)) given by 


X t = 0.5 


l.le- 50 **- 1 


} X t _] + jo.3 - 0.5e- 50X *-i} A t _ 2 + 0.2 co t , (7) 


where the oo t are standard normal errors. A time plot of a mean-centered 
realization of length 500 of such a series is given in Figure [lj 

Figure [T| about here. 

Since X t -\ is the functional variable, and is one of the autoregressive lags, 
the model in (JT[) must be rewritten and treated as 

X t = m o (AVi) + rni (A^-i) X t _ 2 + 0.2ay. 


Consequently the functional coefficients of the autoregressive terms are 
mo(u t ) = 0.5 ut — 1.1 u t e~ 50u * and mi(u t ) = 0.3 — 0.5e _50 “*, 
where u t = X t _ { . 

To estimate the functional coefficients, begin by accounting for the vari¬ 
ability in the response due to the term mo('Ut). Remove that variability, and 
use the pseudo-responses to estimate mi(u t ). Noting that the maximum lag 
is 2, we have 

1. For j — 0 in equation Q, fit a spline to the mean-centered data. The 
result is an estimate rho(ut) of mo(ut). Note that the sum of squares in 
equation (J5]) has no second sum, since we are considering only a single 
value of j,j = 0; that is, equation ([5]) reduces to 

T f N+l 

t =3 l k =0 
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2. Compute pseudo-responses Wt,i, t — 3,... ,T using 

W t ,i = X t - rh Q (ut), £ = 3,4, ...T. 

These pseudo-responses are a proxy for the original realization, but 
with the effect of the m 0 (u t ) removed. 

3. Fit a kernel regression to the pseudo-responses to get the SBK estimate 
rh^ut) of m\{u t ). 


Repeat the procedure, reversing the roles of mo and m\. To get the coeffi¬ 
cients Afc, 2 , k = 0,1,..., N + 1 for the spline, minimize the sums of squares 


E 


Xt- 


(N+l 

iE a 


k, 0 


bk(u t ) 


2 




t =3 


fc =0 


The pseudo-reponses W t> 2 , t — 3,4,... ,T are computed via 


W t ,2 = X t - rhi(u 2 )X t ^ 2 - 

Figure |2] shows the estimation results of a simulated series from the ex¬ 
ponential autoregressive model in equation ([7]) with 11D standard normal co t 
and 500 samples. The dark curves of dots are the estimated functions, and 
the solid (thin) lines are the true functions. The dashed lines are the 95% 
pointwise confidence bands. 

Figure [2] about here. 


2.3. Spatial Modeling for Lattice Data 

For a fixed time t, consider a lattice process Y s , s = 1, 2,... S. In this 
section, to ease notation, the time index t is suppressed. Let A f s represent 
a neighborhood around location s. The simultaneous autoregressive (SAR) 
model is defined as _ 

v; = X] 

j'e M s 


where B SJ ' is a set of coefficients that induces the spatial autocorrelation be¬ 
tween locations f and s in A f s , and S s are independent, zero-mean, constant 
variance errors. The SAR model was first introduced by Whittle (1954). 
The adjective “simultaneous” describes the S autoregressions that occur si¬ 
multaneously at each data location in the formulation. To fit this model 
in Section |3j we will employ a two nearest neighbor structure to define J\T S . 
The model is fitted using maximum likelihood estimators which are obtained 
using the R package spdep (Bivand et ah, 2013). 
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2-4- Spatio-temporal Modeling 

We now introduce two spatio-temporal models both of the form in (JT|). 
The first model we consider uses the noise process defined in (|2]) and assumes 
a separable covariance structure for this process. The time series structure, 
X s>t , is modeled as an FCAR model using the SBK method. Values of p and 
d in (|3]) are allowed to vary between locations, and the spatial structure, Y Sjt , 
is modeled separately using a SAR model at each time. By modeling the time 
and space components separately, we are implicitly assuming separability. If 
this assumption is appropriate, then the order in which the two models are 
fit (time-then-space, or space-then-time) should not matter. 

The second spatio-temporal model does not assume separability. Com¬ 
bining the FCAR(p) model with a generalized version of the SAR model, 
we define the space-time functional coefficient simultaneous autoregressive 
(FCSAR) model as 


b p 3 

^ = E E fds,£,w^£,t—w T ^ ^ (Z s ,t— d s ) Z s t—k T (8) 

w=l k =1 


where b is the spatial time order for the spatial component in the model, 
(3 s ,e,w is the spatial autocorrelation between locations s and t at a time lag of 
w, Z S)t _d s is a delay variable, and e S) t are IID with mean zero and constant 
variance a 2 e . We allow the values of p s and d s to vary among locations. The 
FCSAR model is based on the space-time simultaneously specified autore¬ 
gressive model of Woolrich et ah (2004). 


3. Application and discussion 

To illustrate the utility and compare the performance of the proposed 
models, we model GHI data at the 1.2 MV La Ola PV plant on the island 
of Lanai, Hawaii. We chose to model GHI rather than POA irradiance to 
illustrate a more general application of our method. The La Ola PV plant 
contains a grid of 12 single-axis tracked arrays arranged in three columns and 
four rows covering a total area of approximately 250m by 250m. At the time 
this work was undertaken, the La Ola data comprised the only available 
irradiance data set with concurrent measurements from a regular grid of 
sensors across the footprint of a single PV power plant. However, the La Ola 
data are POA irradiance rather than GHI. Sandia National Laboratories and 
SunPower Corporation designed an irradiance measurement system in part 
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to study the effects of the movement of cloud shadows across the PV arrays 
on the power output of the plant (Kuszamaul et al. 2010). Plane-of-array 
(POA) irradiance (in W/m 2 ) is measured at the midpoint of each tracking 
array using LiCor-200 pyranometers. 

Before fitting the models it is necessary to remove the diurnal trend, a 
step which we found somewhat difficult. Clear sky models are available for 
removing trends from measured GH1 data; a review of some of these models 
can be found in Reno et al. (2012). We set out to use clear sky models to 
remove the diurnal trend, which would present no great difficulty for mea¬ 
sured GH1. We know of no equivalent “clear-sky” model for POA irradiance 
(although, if the tracking algorithm is known with sufficient precision, such a 
model could be assembled by applying a GHI-to-POA translation model, e.g., 
the DISC model of Maxwell (1987) to the output of a clear-sky model). We 
translated POA irradiance to GHI by assuming the isotropic sky model for 
the sky diffuse irradiance and using concurrent measurements of diffuse hor¬ 
izontal irradiance (DHI) and direct normal irradiance (DNI) from a nearby 
rotating shadowband radiometer (RSR) operated by the National Renewable 
Energy Laboratory. Because tracker rotations are not measured we estimated 
the angle of incidence on the modules using a generic algorithm for single¬ 
axis tracking (Lorenzo et al., 2011). Even with the use of measured DHI 


and DNI, the estimated GHI profiles were not well-matched with the output 
of available clear-sky models, and the clear sky models performed poorly in 
removing the trend. Consequently, we removed the diurnal trend in the es¬ 
timated GHI by using a local polynomial kernel regression implemented in 
the KernSmooth package (Wand, 2012) in the R programming software. 

We selected one year (i.e., January 1, 2010 to December 31, 2010) of POA 
irradiance measurements, which are recorded every second. We observed 
little to no variability from one irradiance measurement to the next at one 
second intervals and consequently reduced the data by time averaging. We 
investigated time-averages of lengths of 30 seconds, 1 minute, 5 minutes, 
and 10 minutes. Longer time averages (e.g., 15 and 20 minutes) were also 
considered but did not appear to be signficantly different from the 10 minute 
averages. Much of our exploratory work was done using 10-minute averaged 
data to reduce computational burdens. 

The top time plot in Figure [3] contains the 10-minute time averages of 
estimated GHI in solid black superimposed with the local polynomial ker¬ 
nel regression estimate in dashed red for March 10. The bottom time plot 
contains the residuals, hereafter referred to as “transformed irradiance,” ob- 
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tained after removing the diurnal trend by subtracting the kernel fit. 


Figure [3] about here. 


Having removed the diurnal trend, we next examined a large number of 
time plots of GHI to find days with different variability characteristics. For 
each day, the weather condition was classified visually as being in one of 
three categories: clear, partly cloudy, and overcast, by the variability and 
magnitude of GHI. Figure [4] shows the 10-minute time averaged irradiance 
and the transformed irradiance a clear day (October 21), a partly cloudy day 
(April 1), and an overcast day (August 3). 


Figures 4(a) through 4(c) about here. 


For 2010, in Lanai, HI, only six days could be classified as “clear” throughout 
the entire day. For partly cloudy and overcast conditions, we found many 
days. For both of these weather conditions, six days in 2010 were randomly 
selected. 

For each selected day, we explored whether assuming space-time covari¬ 
ance separability in ([l]) would be justified. Using the separable models we 
fit the data in two ways: space-then-time and time-then-space. If the sepa¬ 
rability assumption is appropriate, then the two models are equivalent and 
should yield similar results. For the space-then-time approach, we first fit 
the SAR model to the 16 sensors for each time, t. We obtain the residuals 
from the fitted SAR model, and then apply the FCAR model to each sensor 
separately. For the time-then-space model, we first fit the FCAR model to 
the detrended irradiance for each of the 16 sensors, and then the residuals 
from the fitted FCAR model are fit with the SAR model at each time point. 
For each approach the root mean square errors (RMSEs) (over all sensor 
locations and times) for eighteen days with three different weather condi¬ 
tions are found in the first two columns of Table [TJ For all days considered 
in this study, the RMSE for the model that fit space first is considerably 
smaller than than when time was fit first. This is a strong indication that 
the assumption of separable covariance structures is not supported and that 
nonseparable models should be employed. 

For a fixed time t. because PV cells are at fixed locations, the spatial 
structure can properly be considered a lattice. Consequently, for the nonsep¬ 
arable model we fit the FCSAR model in (|8j) for spatial time orders 6=1,2. 
The last two columns in Table [Q contain values of RMSE for these two fits. 
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The FCSAR model with b = 2 has the smallest RMSE for all 18 days, indicat¬ 
ing the best fit among the models considered. For cloudy and partly cloudy 
conditions RMSE decreases substantially from b = 1 to b = 2 indicating that 
a lagged model is needed for greater prediction accuracy. 

Table [D about here. 

Figures [5] through [7] contain six plots, grouped in three pairs. Each figure 
displays one sensor location for one day: a clear day (October 21, Figure[5]); a 
cloudy day (April 1, Figure [6]); and a partly cloudy day (August 3, Figure [7]). 
For any one pair of plots, the top graph contains the GHI data represented 
by a solid black line, and the modeled GHI represented by a red dashed line. 
The bottom graph contains the detrended GHI data (solid black line) and 
the detrended modeled GHI (red dashed line). For all three days, the set 
of two plots labeled (a) were fit using a separable time-then-space approach; 
the two sets of plots labeled (b) were fit using a separable space-then-time 
approach; and the plots labeled (c) were fit using the nonseparable FCSAR 
model with b = 2. 


Figures [5] through [7] about here. 


The collection of figures illustrates the nonseparable approach yields the best 
fit, regardless of the weather conditions, which is in agreement with mini¬ 
mum RMSE in Table [lj However, where RMSE is an aggregate measure 
of goodness-of-fit, the plots illustrate that at individual time points, the 
goodness-of-fit is uniformly better for the nonseparable model. 

Forecasting the FCSAR model in time is largely dependent on using 
the SBK method for forecasting the FCAR term in (pi). In Patrick et al. 


(2015), methodology is presented for forecasting a FCAR model using the 
SBK method. In this paper, we examine the performance of forecasting ([8]) 
in space for unobserved locations by using cross-validation. We simulated 
unobserved locations by omitting one or several sensors from our data set, 
and compared FSCAR model performance with a commonly used interpo¬ 
lation technique to judge the potential improvement offered by the FCSAR 
model. 

Unobserved data are often estimated by interpolating between nearby 
sensors; one such technique is natural neighbor interpolation which comprises 


a weighted average with weights determined by a Voronoi partition (Sibson 


1981). A Voronoi partition divides the space that contains the sensors into 
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regions. Each sensor will have a corresponding region consisting of all points 
closer to that sensor than to any other. We constructed a Voronoi partition 
on the set of training sensors along with the location of the missing sensor. 
For cross-validation, we took the weighted average of the training sensors 
where the weights are determined by the size of the regions. This weighted 
average is used for the prediction for the missing sensors. 

We fit the FCSAR model to the training set of sensors with b = 2 and 
using a two nearest neighbor structure for A f s . For each missing sensor, we 
determined the two nearest neighbors and predicted the irradiance by using 
the estimated /3’s for those neighbors. 

For our set of 16 sensors, we calculated the predictions with k = 1,2, 3,4 
missing sensor locations. For k > 1, we predicted for each missing location 
one at a time. We calculated the root mean prediction error (RMPE) as 

(4«-z»,.) 2 Y 

sGfli \ t=\ / 

where fij is the ith set of k missing sensors, Z s>t is the predicted irradiance for 
the sth sensor at time t, and Z s t is the observed irradiance. The RMPE q. 

possible combinations of k missing sensors. 

The mean RMPE is calculated as 

_ i 

RMPE k = ~Y J RMPE Qi . 

2=1 

To compare the FCSAR model to the interpolation method, we take the ratio 

RMPE k for FCSAR 
RMPE k for interpolation 

The ratios for all 18 days are plotted in Figure [8j Ratios less than one 
indicates that the FCSAR model performs better at predicting the missing 
sensors. All ratios are less than one except for two days both of which are 
clear days. 


is calculated for all K = 


16 

k 


Figure [8] about here. 
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To examine the effect of different time averaging windows on the FSCAR 
model’s performance we fit the model for a range of time averaging windows, 
from 10 minutes down to 30 seconds. For each day, we calculated the model’s 
RMSE as well as the adjusted coefficient of determination R 2 . The adjusted 
coefficient of determination R 2 a quantifies the level of agreement between the 
data and a fitted model taking into account the number of variables used in 
the model: 

2 S-W (TS - u Pit) 
a SS Tot J(TS) ’ 

where 

S T 

ss « = EE(h-Z.j)’. 

s= 1 t= 1 V 
S T 

s=l t =1 


z 


1 

TS 


s=1 t =1 


S,t 1 


Z Sj t is the predicted irradiance for the sth sensor at time t, Z Sit is the observed 
irradiance, and z/pit is the number of parameters used in the fit. Since we are 
using kernel regression to fit the time series, we must estimate the number of 
parameters associated with that regression. For the SBK estimate of the kth 
coefficient function in ([8]), the effective number of parameters is the trace of 
the smoother matrix 


1 

0 


fic'Mcj 


1 

T 


C'M 


m 


@ 


see 


Hastie and Tibshirani 

1990 

Cai and Tiwari 

2000) 


the total number of parameters as the sum of the parameters in the first 
double sum in ([8]) plus the sum of the effective number of parameters for the 
FCAR term. The values of R 2 and RMSE are shown in Table [2j We show 
the fits of three days for the different time averages in Figures |9]flTl 

Table [2] about here. 


Table [2] shows that as the time averaging window decreases, RMSE in¬ 
creases and and R 2 decreases, both indicating increasing disagreement be¬ 
tween data and model. However, as the time averaging window decreases, 
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variance in time averaged data at any individual location increases substan¬ 
tially (Figures [9 (XT). As ramps in the data increase in both magnitude 
and frequency the largest residuals of the fitted model also increase. Simi¬ 
lar patterns are evident in the spatially-averaged data. Figure p~2| compares 
distributions for the spatially averaged detrended irradiance data and cor¬ 
responding distributions residuals for the fitted FSCAR model for a partly 
cloudy day. As the averaging window decreases., outliers increase in both the 
data and the model residuals also increase, leading to the increasing RMSE 
and decreasing R 2 a evident in Table [ 2 ] However, the FCSAR model continues 
to fit the bulk of the data equally well across all time averaging windows, 
as is demonstrated by the relatively constant boxes and whiskers across the 
different time averages. Thus, the FCSAR model follows time averaged data 
equally well for various averaging windows. 

Figures [9] through pTl] about here. 

Figure [12] about here. 


4. Conclusion 

We have presented a novel nonseparable spatio-temporal model for GHI 
data. This approach, termed the FCSAR model, outperforms a natural 
neighbor interpolation when predicting GHI at unobserved locations over 
the footprint of a PV system. We compared the nonseparable FCSAR model 
with simpler, separable models, and find little support for models that as¬ 
sume a separable covariance structure. The FSCAR model integrates an 
FCAR form for the time series component of the model and a SAR form for 
the spatial component. The FCAR(p) form of the time series component of 
our nonseparable model makes the FCSAR model flexible and reliable, and 
may be suitable for fitting irradiance data in general. Currently, the model is 
fit separately on each day. Further research will consider validating the fitted 
models by comparing predicted aggregate irradiance with generated power 
for a much larger solar power plant than La Ola. Future work may also 
explore adding a weather condition covariate that will allow the model to be 
fit over days with different weather conditions, by permitting the coefficient 
functions in the time series structure to vary based on weather condition. 
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Time Plot of Mean Centered Exponential Autoregressive Series 



Figure 1: Mean-centered realization of length T = 500 from an EXPAR(2) model given in 
equation (jTJ). 
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Figure 2: Spline-backfitted kernel estimates of the coefficients mo(u) = 0.5u — 
l.luexp{—50u 2 } (left panel) and mi (it) = 0.3 — 0.5 exp{—50u 2 } (right panel). Heavy 
curves are the estimates rh${u) and mi (it); thinner lines are the true functions mo (it) and 
mi (it); dashed lines are 95% confidence bands. 
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March 10, 2010 



0 1 2 3 4 5 6 7 8 9 10 12 14 15 16 18 20 22 24 

Time (Hours) 


Figure 3: Top graph is the time plot of 10-minute averages (solid black) of irradiance 
measurements for March 10 with the local polynomial kernel estimate (dashed red) super¬ 
imposed. The bottom plot is transformed irradiance (residuals after using local polynomial 
kernel regression to remove the diurnal trend). 
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October 21, 2010 
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(a) October 21, clear. 


April 1,2010 



Time (Hours) 

(b) April 1, partly cloudy. 


August 3, 2010 



0 1 2 3 4 5 6 7 8 9 10 12 14 15 16 18 20 22 24 

Time (Hours) 

(c) August 3, overcast. 


Figure 4: Time plots of 10-minute time averaged irradiance and of transformed irradiance 
for (a) a clear day, October 21, (b) a partly cloudy day, April 1, and (c) a overcast day, 
August 3. 25 







Table 1: Root mean squared error (RMSE) for the four spatio-temporal models of the 
days with clear, partly cloudy, and overcast conditions. Columns S-T and T-S contain the 
RMSE for the separable spatio-temporal models. Column S-T contains RMSE for data 
with the spatial component fit first, then time; Column T-S contains RMSE with the time 
component fit first, then space. The last two columns contain RMSE for the nonseparable 
FCSAR model with spatial time orders b = 1 and 6 = 2, respectively. 


Condition 

Date 

Separable 

FCSAR 

S-T 

T-S 

6=1 

6 = 2 

Clear 

Feb. 3 

0.36 

2.45 

0.36 

0.16 


Feb. 16 

3.57 

14.98 

2.18 

1.36 


Mar. 18 

0.34 

2.64 

0.34 

0.22 


Mar. 19 

0.32 

6.17 

0.35 

0.22 


Oct. 21 

0.67 

4.12 

0.55 

0.42 


Dec. 16 

0.97 

6.60 

0.83 

0.44 

Partly Cloudy 

Mar. 7 

6.88 

63.74 

5.49 

3.35 


Apr. 1 

7.61 

99.56 

5.47 

4.51 


May 10 

5.91 

56.86 

4.90 

3.72 


June 4 

8.46 

51.58 

4.84 

3.36 


June 28 

3.63 

55.10 

2.39 

1.76 


Nov. 15 

10.39 

58.24 

7.14 

6.00 

Overcast 

Feb. 1 

7.52 

62.61 

6.20 

4.88 


Mar. 15 

12.82 

118.23 

9.12 

5.50 


Apr. 6 

5.66 

48.15 

2.97 

2.43 


May 31 

11.44 

91.35 

5.92 

5.08 


Aug. 3 

5.63 

65.81 

3.93 

2.82 


Oct. 27 

4.07 

94.30 

3.10 

1.89 
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(a) October 21, Time-Space. 


(b) October 21, Space-Time. 


October 21,2010 



Time (Hours) 

(c) October 21, Nonseparable. 


Figure 5: Time plots of 10-minute time averaged irradiance and of transformed irradiance 
with predicted values superimposed in red for a clear day, October 21. Forecasting was 
conducted using the (a) time-then-space fitting (b) space-then-time fitting (c) nonseparable 
model. 
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(a) April 1, Time-Space. 


(b) April 1, Space-Time. 


April 1,2010 



Time (Hours) 


(c) April 1, Nonseparable. 

Figure 6: Time plots of 10-minute time averaged irradiance and of transformed irradiance 
with predicted values superimposed in red for a partly cloudy day, April 1. Forecasting was 
conducted using the (a) time-then-space fitting (b) space-then-time fitting (c) nonseparable 
model. 















































August 3, 2010 



(a) August 3, Time-Space. 


August 3, 2010 



(b) August 3, Space-Time. 


August 3, 2010 



(c) August 3, Nonseparable. 

Figure 7: Time plots of 10-minute time averaged irradiance and of transformed irradiance 
with predicted values superimposed in red for an overcast day, August 3. Forecasting was 
conducted using the (a) time-then-space fitting (b) space-then-time fitting (c) nonseparable 
model. 
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Table 2: Root mean squared error (RMSE) for the nonseparable spatio-temporal models with b = 2 of the days with clear, 
partly cloudy, and overcast conditions. The columns are the RMSE for data at 30-second, 1-minute, 5-minute, and 10-minute 
averages. The values in parenthesis are the R%. 


Condition 

Date 


30-sec 


1-min 


5-min 


10-rnin 

Clear 

Feb. 3 

0.38 

(0.990) 

0.30 

(0.993) 

0.19 

(0.999) 

0.16 

(0.999) 


Feb. 16 

4.98 

(0.952) 

3.68 

(0.970) 

1.82 

(0.988) 

1.36 

(0.963) 


Mar. 18 

1.79 

(0.970) 

0.99 

(0.989) 

0.42 

(0.999) 

0.22 

(0.992) 


Mar. 19 

0.81 

(0.999) 

0.63 

(0.998) 

0.32 

(0.952) 

0.22 

(0.993) 


Oct. 21 

3.94 

(0.846) 

2.21 

(0.926) 

0.67 

(0.991) 

0.42 

(0.983) 


Dec. 16 

0.67 

(0.998) 

0.61 

(0.998) 

0.49 

(0.952) 

0.44 

(0.993) 

Partly Cloudy 

Mar. 7 

25.66 

(0.920) 

17.68 

(0.960) 

5.42 

(0.991) 

3.35 

(0.997) 


Apr. 1 

31.75 

(0.932) 

24.65 

(0.960) 

6.99 

(0.996) 

4.51 

(0.998) 


May 10 

25.91 

(0.927) 

19.68 

(0.954) 

7.70 

(0.987) 

3.72 

(0.999) 


June 4 

22.05 

(0.907) 

20.60 

(0.920) 

7.21 

(0.986) 

3.36 

(0.995) 


June 28 

13.85 

(0.935) 

10.12 

(0.966) 

3.88 

(0.994) 

1.76 

(0.999) 


Nov. 15 

27.69 

(0.908) 

22.56 

(0.935) 

9.24 

(0.979) 

6.00 

(0.987) 

Overcast 

Feb. 1 

26.46 

(0.899) 

24.12 

(0.917) 

9.27 

(0.982) 

4.88 

(0.993) 


Mar. 15 

27.85 

(0.938) 

20.29 

(0.968) 

6.05 

(0.996) 

5.50 

(0.997) 


Apr. 6 

24.22 

(0.883) 

19.44 

(0.930) 

4.01 

(0.997) 

2.43 

(0.997) 


May 31 

42.26 

(0.899) 

30.26 

(0.943) 

9.60 

(0.988) 

5.08 

(0.997) 


Aug. 3 

25.92 

(0.948) 

20.59 

(0.966) 

6.15 

(0.995) 

2.82 

(0.998) 


Oct. 27 

7.20 

(0.977) 

5.52 

(0.989) 

2.61 

(0.998) 

1.89 

(0.999) 




Number of Sensors Missing 


Number of Sensors Missing 


(a) Clear days. 


(b) Partly cloudy days. 



Number of Sensors Missing 
(c) Overcast days. 

Figure 8: Plots of the ratios of the RMPE for (a) the clear days, (b) the partly cloudy 
days, and (c) the overcast days. The ratios are calculated as the RMPE of the FCSAR 
model divided by the RMPE of linear interpolation. 






10 Minute Average for Oct 21 


5 Minute Average for Oct 21 



(a) 


1 Minute Average for Oct 21 




hour of day 

(b) 


30 Second Average for Oct 21 


- observed 

- - fit 



11.0 11.5 12.0 12.5 13.0 

hour of day 


(c) (d) 

Figure 9: Plots of the detrended observed irradiance and the fit of the FCSAR model 
for sensor 1 from 11:00 to 13:00 on October 21 (a clear day). The different plots are for 
different averages: (a) 10 minutes, (b) 5 minute, (c) 1 minute, and (d) 30 second. 
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10 Minute Average for Apr 01 



11.0 11.5 12.0 12.5 13.0 

hour of day 


(a) 


1 Minute Average for Apr 01 



11.0 11.5 12.0 12.5 13.0 

hour of day 


(c) 


Figure 10: Plots of the detrended observed i 
sensor 1 from 11:00 to 13:00 on April 1 (a p 
different averages: (a) 10 minutes, (b) 5 mir 


5 Minute Average for Apr 01 



11.0 11.5 12.0 12.5 13.0 

hour of day 


(b) 


30 Second Average for Apr 01 



11.0 11.5 12.0 12.5 13.0 

hour of day 


(d) 







10 Minute Average for Aug 03 
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hour of day 


(a) 


1 Minute Average for Aug 03 



11.0 11.5 12.0 12.5 13.0 

hour of day 


(c) 

Figure 11: Plots of the detrended observed b 
sensor 1 from 11:00 to 13:00 on August 3 (, 
different averages: (a) 10 minutes, (b) 5 mm 
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5 Minute Average for Aug 03 



11.0 11.5 12.0 12.5 13.0 

hour of day 


(b) 


30 Second Average for Aug 03 



11.0 11.5 12.0 12.5 13.0 

hour of day 


(d) 


idiance and the fit of the FCSAR model for 


overcast day). The different plots are for 
3, (c) 1 minute, and (d) 30 second. 







Data Boxplots for Apr 01 
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Residual Boxplots for Apr 01 
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(b) 

Figure 12: Boxplots for (a) the observed detrended data and (b) the residuals of the fit of 
the FCSAR model with b — 2 for April 1 (a partly cloudy day). 
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